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Preface 


The promise of autonomous vehicles 


The promise of fully autonomous cars that drive themselves has beckoned 
for decades. It is typical for an American to spend an hour a day in a car, 
with much of that time spent in a relatively unpleasant commute to work 
rather than enjoying the idyllic lure of an open road adventure that is so 
baked into the culture. Wouldn’t it be nice if we could watch a movie, take a 
nap, or otherwise relax instead of jockeying for position with all the other 
commuters? Or maybe we could go to sleep in our garage and wake up in 
another city, our personal luxury transportation pod having let us sleep away 
the boring hours of an all-night drive to the next business meeting. 

Other potential applications for autonomous vehicle (AV) technology 
abound. They include a potential major restructuring of long-haul trucking, 
parcel delivery, public transportation, and in particular, dramatically 
increased access to transportation for those who cannot drive. There are 
tradeoffs involved, and it remains to be seen how things will play out. But 
with tens of billions of dollars pouring into investments in the technology, 
expectations are set high. 

A salient AV promise is dramatically improving road safety. Indeed, the 
lead selling point has come to be that deploying the technology is urgent 
because every year it is delayed means more people die on our roadways. 

However, the topic of safety is far more complicated than the facile 
talking points usually involved, such as “computers won’t drive drunk so of 
course they will be safer than human drivers.”' On the other hand, it is 
unreasonable to expect AVs to be perfectly safe. Rather, AVs should be 
acceptably safe, achieving some balance between the benefits they provide 
and the risk they impose on society. 

A common notion is that AVs will be safe enough if they are better than 
human drivers. While intuitively appealing, that simple criterion is unlikely 
to work in practice. First, “safer than a human driver” is much more 
complicated than it might seem if you need to address which driver, 
operating where, and under what conditions. Second, other considerations 
need to be addressed such as how much redistribution of risk is permissible. 
Is it OK to kill twice as many pedestrians if the total fatalities including 
passengers decreases? And third, the technology is so immature that 
predicting the safety of an AV before it is deployed is a major challenge. 

In reality, nobody yet knows if AVs will be as safe as human drivers when 
deployed at scale. We hope that will be the case, but it might not even be 
possible with our current technical abilities beyond a small number of benign 
environments with highly constrained capabilities. Or we might just be 
another few billion dollars of investment and a year away from self-driving 


' This particular fallacy is debunked in section 4.8.2. 
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utopia.” Regardless, we would like to do better than deploy with insufficient 
evidence that the technology is safe and just see how it turns out. 

Being able to know that an AV is safe enough before we have experience 
with at-scale deployment is no easy problem. But if we cut corners on being 
able to ensure acceptable safety, it seems likely that high-profile crashes are 
inevitable. A pattern of such crashes — or even one especially horrific event — 
might cause society to reject the technology, setting back progress a decade 
or more.* 

AVs have been the technology of the future for decades. Maybe this will 
be the time the technology really deploys at scale. Certainly ’'d welcome 
improved road safety and the ability to relax instead of having to concentrate 
on drives I find boring. But this technology will not be viable if the public 
does not find it acceptably safe. To get there we need to understand not only 
what acceptably safe means, but also how we might measure whether we are 
there or not. That is the scope of this book. 

An important scope disclaimer is in order. This book does not address the 
significant challenges involved in taking a machine learning-based 
technology and making it safe. That topic is crucially important, but is an 
entirely different area that is still in flux. So this is not about telling anyone 
how to design a safe AV. Rather it is about how to structure a way to 
evaluate whether the designers have actually achieved their goal of being 
safe enough. 


Why should I listen to this guy? 


I started working on AV safety in the mid-1990s as a member of the 
Carnegie Mellon University Navlab team as part of the Automated Highway 
Systems (AHS) project run by US DOT Federal Highways.* That was years 
before the DARPA grand challenges. The work culminated in a 1997 demo 
on a closed highway in San Diego.° At the AHS demo, Carnegie Mellon 
demonstrated camera-based lane following technology on not only cars, but 
also a pair of city buses. Berkeley PATH demonstrated platooned cars guided 
by magnets embedded in the roadway. A number of other organizations also 
produced useful technology and engineering analysis® but in the end there 
was no planned path forward, and the idea went dormant in the public eye for 
almost a decade before DARPA picked up the topic. 


? Unlikely to happen. More likely, it will be many years and many more tens of 
billions of dollars before this technology can deploy at scale. 

3 While different in many ways, the history of the nuclear power industry is an 
important cautionary tale for what happens when high-profile loss events occur after 
society has been assured that a technology is safe. 

4 See this AHS status report by Lay et al. from 1996: 
https://rosap.ntL.bts.gov/view/dot/38381 

> See Thorpe, Jochem & Pomerleau, 1997: 
https://www.ri.cmu.edu/pub_files/pub2/thorpe_charles_1997_1/thorpe_charles_1997 


Lpdf 
® See Bishop, Dopart & Shladover 1997: 


https://path. berkeley.edu/sites/default/files/demo97foravs17v6.pdf 


I was not part of the DARPA Grand Challenges, but I was involved in 
other ground robotics safety and robustness via work with a team at the 
National Robotics Engineering Center (NREC) at Carnegie Mellon 
University.’ NREC and its parent Robotics Institute have produced many of 
the key players in the AV industry today. However, I’m not a “robo-grad” as 
they are called. Rather, I have worked at the engineering school with a 
concentration on dependability and safety. During the initial AHS project and 
later the decade or so I spent working with NREC, I learned about 
autonomous vehicles and spent a lot of time thinking about safety. 

I also have considerable experience with non-autonomous embedded 
system software design practices and safety in a number of other industries. 
I’ve had research funding, industry experience, and hundreds of consulting 
engagements covering conventional automotive, railway, chemical process, 
aviation, factory automation, building automation, vertical transportation, 
electrical power, consumer goods, combat systems, chip design, and even 
medical applications. I have also dealt with safety standards across those 
fields. Additionally, I have up-close and personal lived experience with 
applied safety practices from my time as a US Navy submarine officer, 
where so many things must be done perfectly if you and your shipmates want 
to avoid having a very bad day. Finally, I have seen the inside of a courtroom 
and other legal processes while working as an expert witness.* 

More recently I have become involved in safety standards specific to AVs 
and processes that are creating AV regulations. Sensing a reluctance of the 
industry to commit to AV-specific standards, I spearheaded an effort to 
create ANSI/UL 4600, which is aimed at ensuring that an AV is acceptably 
safe.’ As I write this, ANSI/UL 4600 is well on its way to being updated to a 
third edition to fully encompass not only light vehicles, but also heavy 
trucks.!° I am also active on several other industry standards committees that 
deal with conventional and autonomous vehicle safety. 

After those experiences and more, I feel that I have as much visibility and 
insight into the problems with AV safety as anyone can have in such a fast- 
moving and secretive world. I hope that this book makes it easier for others 
to understand the various challenges and potential solutions I’ve seen along 
the way. 


Audience 


This book is intended to be useful for a wide range of stakeholders who 
are interested in ensuring that AV technology is deployed safely. That 
includes engineers, regulators, legislative technical staff, government affairs 


7 See: https://www.nrec.ri.cmu.edu/ 

8 For a one-hour lecture on what I learned in one such case, see: 
https://youtu.be/DKHa7rxkvK8 

° See Koopman et al. 2019: 


https://users.ece.cmu.edu/~koopman/pubs/Koopmanl9_ WAISE UL4600.pdf 
‘0 For a simple starting page for ANSI/UL 4600 information, see: 


https://users.ece.cmu.edu/~koopman/ul4600/index.html 
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staff, insurers, technical journalists, students, mobility experts, and 
technology enthusiasts. Rather than attempting to write for a single uniform 
audience, each section is written in a way that is as accessible as I can 
manage while still not holding back on detail relevant to deeper 
understanding of the core issues for specialists. 

Depending on the reader’s background some sections will be more 
accessible than others. I’ve provided summaries for second and some third- 
level headings to follow the main flow for those who might find some 
sections less relevant to their needs. If you find your eyes glazing over at 
some point, feel free to skip ahead to the next summary paragraph to get a 
change of topic. If you are relatively new to the area of AV safety in general, 
you might want to start with my free video short course on AV safety to get 
up to speed before diving into the details in this book.!! 


Book Organization 


The chapters of the book are organized as follows: 


e Chapter 1 provides a light introduction and whirlwind tour of the 
material in the book. 


e Chapter 2 goes over terminology, vehicle automation modes, and key 
safety challenges that need to be addressed to be able to say an AV is 
acceptably safe. 

e Chapter 3 covers risk acceptance frameworks. It turns out there is more 
than one way to frame the question of what risk might be acceptable. 

e Chapter 4 covers what people mean by “safe.” This chapter is a result of 
having been in too many discussions where people were talking past 
each other meaning completely different things by the word “safe.” !? It 
also includes a list of misleading industry-promoted talking points that 
are harmful to productive discussion about acceptable safety. 

e Chapter 5 discusses how to set an acceptably safe goal, including setting 
a comparative safety baseline and accounting for things beyond simply 
total number of fatalities involved in crashes. 

e Chapter 6 discusses how to measure and predict safety in more detail. It 
is not enough to count up the losses after the fact. There needs to be a 
way to build confidence before deploying. 

e Chapter 7 covers safety cases and how Safety Performance Indicators 
(SPIs) can be integrated into safety cases to provide safety metrics 
supporting a “safe enough” decision-making process. I believe the 
concept of SPIs as presented will be a key to deploying AVs safely at 
scale. 


'l Short course lectures are hosted with open access both on YouTube and 


Archive.org, including both video lectures and slides: 
https://users.ece.cmu.edu/~koopman/lectures/index.html#av 

" Too often, I myself have been a participant in the talking-past exercise. This 
chapter is in part a reflection to help me get better at not doing that. More 
importantly, if we don’t even know what we mean by “safe” we cannot have an 
intelligent discussion about “how safe.” 
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e Chapter 8 deals with how to identify, monitor, and respond to SPI-based 
metrics. Coming up with an actionable and measurable safety case is the 
hard part. This book frames the situation, but is not a deep dive into the 
nuances of safety case construction itself. 

e Chapter 9 (finally!) outlines how to decide when it is ethically 
responsible to deploy an AV despite inevitable uncertainty. It also covers 
how to ensure acceptably safe road testing, which is more difficult to do 
safely than simply putting a driver in the vehicle and telling them not to 
crash. 

e Chapter 10 discusses some ethical issues relevant to AV safety that will 
need to be addressed before the technology can be deployed at scale, 
including regulatory considerations. Spoiler: the infamous Trolley 
Problem is not what we should be spending time talking about. 

e Chapter 11 wraps up, presenting pointers to resources readers might find 
helpful. 


Writing Practicalities 


This book is more of a discussion and not an academic review paper. It is 
light on references not directly relevant to the discussion, and even has a bit 
of snark to lighten things up at times.'? You will not find an exhaustive 
literature survey here, but rather mentions of things that have caught my eye 
as being especially relevant. There is a bit of redundancy across some 
sections because some topics interact with multiple other topics. I’ve tried to 
cross-reference and shorten overlapping discussions, but there is no perfect 
solution for this. If you think I really missed the boat on something let me 
know. '* 

Part of the informality is that many references are to information on the 
Web, with an emphasis on finding as many open access sources as possible 
rather than paywalled material. Rather than spend time on tedious (and often 
elusive) formal citations, I’ve added URLs. If a URL goes stale, readers are 
encouraged to look up the history of the URL via the Wayback Machine at: 
https://archive.org/ to recover the relevant content. All URLs listed have 
been checked as being active as of July 2022. 

Some footnotes refer to Wikipedia and other non-authoritative sources. 
These references are made because the particular material cited seems like a 
reasonable starting point for those who want to understand more about a 
topic. They are not meant as a definitive justification for a point being made. 
Readers are cautioned that material on Wikipedia might not always be 
accurate. 

I’ve used footnotes to try to avoid derailing the main discussion with 
parentheticals and to provide references to resources that are freely 


'3 Safety of life-critical systems is in fact serious business. But education without at 
least a little humor is ineffective, even if the topic is serious. 
'4 We also use the “royal we” in subsequent chapters. It is really just me talking. 
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accessible to the maximum degree practical. Rather than typing in all the 
URLs you can find a web page with clickable links here: 
https://users.ece.cmu.edu/~koopman/SafeEnough/ 

Some of the contents of this book have appeared in various forms in blog 
entries, web pages, and the like. However, the new material is substantial, 
and even existing material has been edited or even rewritten for this book. 
This book is definitely not just a rehash of blog posts — not by a long shot. 

Finally, examples use English and Metric units more or less at random in 
an attempt to appeal to users of both systems.!° Discussions of regulatory 
matters emphasize what is happening in the US. Regulatory challenges 
outside the US differ in the details, but those differences are not central to the 
message of this book. 


Acknowledgments 


While the book has been written recently, the path has been long and 
winding. I thank the following with special recognition for their contributions 
both direct and indirect on this path: Michael Barr, Michelle Bayouth, Ensar 
Becic, Sagar Behere, Jen Black, Simon Burton, Missy Cummings, Rami 
Debouk, Wes Doonan, Jackie Erickson, Uma Ferrell, Frank Fratrik, Tom 
Fuhrman, Mallory Graydon, Glen Haydon, Mahmood Hikmet, Daniel 
Hinkle, Yoav Hollander, Michael Holloway, Casidhe Hutchison, Rolf 
Johansson, Aaron Kane, Tim Kelly, John Knight, Katina Michael, Joe Miller, 
Beth Osyk, Brendon Ouimette, Fred Perkins, Jens Pollmer, Deborah Prince, 
Justin Ray, Paula Ranallo, Heather Sakellariou, Steve Shladover, Dan 
Siewiorek, Don Slavik, Zhongxin Sun, Chuck Thorpe, Kim Wasson, Jack 
Weast, Chuck Weinstock, William Widen, Marilyn Wolf, Junko Yoshida, 
David Zipper, Membership of IFIP WG 10.4, and Contributors to ANSI/UL 
4600. 


Everyone can improve, and I’m no exception. If you see something in this 
book that you disagree with, something exceptionally relevant I did not cover 
or, worse, an outright mistake, please let me know via an e-mail to: 
AVSafety@Koopman.us 


Philip Koopman 
Pittsburgh, PA, September 2022. 


'S That is my story, and I’m sticking to it. Consider it trying to be fair to readers of 
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1. Introduction 


Just make sure the autonomous vehicle is at least as safe as a human 
driver. Really, how hard can it be to figure that out? 


The fact that you are reading an entire book on the topic of “safe enough” 
should be a hint that this problem gets more complex the deeper you go. Find 
a comfy reading place and let’s dig in. 

This book attempts to find answers to a question that seems simple on its 
face: how will we know when Autonomous Vehicles (AVs) are safe enough 
to deploy? 

To answer that question we touch upon terminology, why AV safety is 
difficult, the nature of safety vs. risk, what “enough” might mean for safety, 
dealing with uncertainty, metrics, decision criteria, regulations, and 
accompanying ethical issues. While we go through those topics, do not forget 
that the point of all this is to be able to answer the question: “Is this AV safe 
enough to deploy on public roads?” We seek answers based on knowledge 
and engineering rigor rather than hope, faith, bluster or willful ignorance. 


1.1. Scope 


The scope of this book is a discussion of how to determine that an AV is 
safe enough to deploy. While there is an overview of many of the technical 
challenges to creating an AV, the emphasis is more about how to measure 
whether the result is acceptably safe. That includes defining a risk framework 
as well as metrics that can both predict safety and provide a traceable path to 
the AV design and validation. 

Later chapters have a fair amount of detail on the use of Safety 
Performance Indicators (SPIs) as they are conceived in the ANSI/UL 4600 
safety standard.'® While there might be other ways to accurately predict AV 
safety before deployment, an SPI-based approach is the way we think will 
work best. 

Topics in scope tend to be in the areas of safety engineering, metrics, 
ethics, and regulatory approaches. This book encompasses topics relevant to 
company leadership, regulators, and other stakeholders having a way to 
know that any decision to deploy an AV or test it on public roads is being 
made in a responsible manner after considering relevant factors. 

Topics out of scope for the book are details regarding machine learning 
validation, software safety, and technical details of suitable arguments that 
might be put into a safety case. Much of that can be found in ANSI/UL 4600, 
but it is not our emphasis here. What ANSI/UL 4600 puts out of scope is a 


16 See materials at: https://users.ece.cmu.edu/~koopman/ul4600/index.html 
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framework for deciding how safe is acceptably safe. That is what we do in 
this book. 


1.2. A whirlwind tour 


Here is a whirlwind tour of the contents of the book. Buckle your seat 
belts! 


Chapter 2: 


To understand how safe we need an autonomous vehicle (AV) to be, we 
first need to understand what we actually mean by “autonomous.” We take 
that to mean a situation in which there is no natural person immediately 
responsible for safety regarding the “‘self-driving” part of the vehicle. While 
it is traditional to use the infamous SAE levels in this discussion, we believe 
the Levels hurt more than help for discussions regarding safety with the 
general public and regulators. We propose an alternate categorization of: 
driver assistance, supervised automation, autonomous operation, and vehicle 
testing. Those categories revolve around the role of the driver rather than the 
technical approach used to implement automation. 

Make sure that you know that an ODD (Operational Design Domain) is 
the set of conditions for which an AV is designed to operate, and that articles 
describing the SAE Levels typically get Level 3 wrong in some way that is 
relevant to safety. 

Autonomous vehicles have key safety challenges at every stage of what is 
often called an “autonomy pipeline” that runs from sensors through 
computations to vehicle outputs. Traditional safety-critical software 
approaches make convenient assumptions such as the external world is 
perfectly understood and there is one uniquely correct response to every 
stimulus. Moreover, software safety typically assumes someone can look at 
the software and determine if it is in fact correct. None of that really works 
out for the perception and “AI” parts of AV technology. And there is the 
matter of unknown unknowns — things we do not realize we do not know that 
might nonetheless cause a fatality while driving. It’s a can of worms. '’ 


Chapter 3: 


A variety of different risk acceptance frameworks might be used. 
Frameworks vary by whether acceptable risk is some value relative to natural 
phenomena or is in comparison to some alternative system. Do you want to 
compare to the risk of death by lightning? Or the risk of death from a human- 
driven car instead of an AV? While comparing to the risk of human-driven 
cars is a popular starting point, it is only a starting point. 


' If it were easy everyone would already have an AV in their garage. We are not 
even close to that now. 
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Chapter 4: 


Whenever someone says an AV is “safe” you might be astounded by the 
range of things that can mean. Indeed, if you ask someone what they mean 
by safe you might get a tangled up set of answers instead of a clean 
definition. We break the possible meanings of “safe” down into categories 
including: comparison to human driver, good roadmanship, lots of testing, 
lots of simulation, followed safety standards, is insured, is a product of a 
company that says safety is #1, and cannot be as bad as a human driver 
because it uses a computer instead. 

The thing is, maybe safety is most of those definitions all at once. We 
propose a hierarchy of safety needs to organize all the definitions.'’ We also 
tear into more than a dozen myths, talking points, and outright propaganda 
themes that tend to be used to confuse the topic of what safety might be and 
why we should believe AVs will improve safety on public roads. 


Chapter 5: 


The current default for “safe enough” in most discussions is at least as safe 
as a human driver. Understanding what that means requires knowing what 
kinds of harm we are comparing (fatality, injuries), on what types of roads, in 
which states, in which operational conditions, and for which drivers. By the 
way, 60-something year old drivers are the safest. '° 

The difficult part is that just being exactly as good as a human driver will 
not be enough because of consumer attitudes and the tendency of AV crashes 
to get more press. A simplistic utilitarian argument is that if an AV is just 
10% safer overall than human drivers it should be deployed because it will 
save lives. However, stakeholders expect computers to be much better than 
human drivers, and engineering margin needs to be included to handle 
inevitable uncertainty about safety predictions. In reality, AVs might need to 
have a predicted safety of 10 to 100 times better than humans when initially 
deployed to be viable. Anything less risks a loss of trust and backlash when 
the crashes start happening in the real world. 


Chapter 6: 


You cannot measure safety without putting numbers on things. Lagging 
metrics tend to measure outcomes, whereas leading metrics try to predict 
how safety will turn out later. However, the leading vs. lagging thing is 
relative to other metrics on a spectrum rather than a clear-cut distinction. 

There are all sorts of metrics that might be useful, including measuring 
different stages of autonomy pipeline performance, engineering rigor, and 
even road miles. Disengagements are probably not that helpful in predicting 


'8 Maslow comes into play. Who said freshman psychology was a waste of time? 

'? Bet younger readers did not see that one coming! For older readers keep this in 
mind the next time someone tries to age-shame you on social media about AV safety. 
It has certainly come in handy for us. 
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safety.”° A good leading metric needs to be predictive of safety rather than 
just correlative or it is prone to being gamed. We like ANSI/UL 4600-style 
Safety Performance Indicators (SPIs) because they are linked to a safety case 
(see next chapter). 

Metrics need a pass/fail threshold criterion or they cannot be used to 
answer the “safe enough” question. The numbers we need are not “more is 
better” but rather “is this number good enough to meet the safety goal?” The 
average probably is not what matters for most metrics. Safety is not about the 
99,999,999 miles where there was no fatal crash — it is about the 1 mile 
where there was a fatal crash. That means safety metrics need to be 
especially good at measuring and predicting very infrequent but high 
consequence events. Safety is about other harms too, but fatalities tend to be 
the headline issue. 


Chapter 7: 


Safety cases provide a structured argument based on evidence that a 
particular claim for safety is true. Once we have defined what we mean by 
“safe enough” we should build a safety case to convince stakeholders that a 
claim of “safe enough” is true based on a reasonable argument backed up by 
evidence. A Safety Performance Indicator (SPI) is a metric that is directly 
attached to a claim that can monitor if the claim is falsified (disproven). If 
your claim is that you never get too close to a pedestrian, the SPI metric 
looks at how often that happens,*! and raises an alarm if the claim is 
invalidated too often.” 

Another issue with safety cases in the real world is that the safety case will 
have omissions, because it is impossible to guarantee you have fully 
analyzed an open world operational environment. There is always some 
safety issue that you have not thought of, or that will not even exist until after 
you deploy. This means safety cases are only somewhat about 
mathematically deductive proof, because they need to grapple with the 


20 Disengagements might have seemed like a good idea at the time, and kudos to 
California for trying to promote data transparency for AV testing. But it’s time to 
move on to something that reflects testing safety outcomes. On the other, hand crash 
descriptions are proving a lot more useful as an impetus for safety transparency. 

See: https://www.dmv.ca.gov/portal/vehicle-industry-services/autonomous- 
vehicles/autonomous-vehicle-collision-reports/ 

21 Remember the part where safety is about very low probability events? The 
threshold might be once every million hours. But most thresholds will not be zero 
because no system is perfect. Safety does not quite require perfection, but it can get 
extremely close to that, so we need SPI thresholds that can cope with very low 
probabilities. 

22 Wait — if a claim is only a little false is that OK? The difference between a 
mathematically pure safety case and the real world is right here staring you in the 
face. A claim that is almost never false can be good enough. Whether the “almost 
never” is built into the claim or into the metric associated claim is a design choice. 
This is a slippery point, so we spend time on this in the chapter. 
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inevitability of unknown unknowns. More important are robustly supported 
claims that are true — as far as we know.”* 


Chapter 8: 


Now we get into applying SPIs with numerous examples of the types of 
things that might be measured. We revisit a baseline safety metric target with 
more detail, and warn that SPIs need to directly match up with claims in the 
safety case to be useful. Any metric that does not directly trace to a claim in 
the safety case might be interesting, but is of questionable prediction validity. 

Leading SPIs can be defined at the system level (did the vehicle do 
something it is not supposed to — even if no crash resulted?), components (is 
your camera struggling to see things it should be able to see?), process (is 
your development team skipping required reviews, analysis, etc.?), and 
operations (are you skipping required maintenance’). 

Monitoring SPIs can be a bit tricky. Even in an unsafe AV, safety SPIs 
will have violation budgets so low that many vehicles will never violate the 
SPI. This means you will need to aggregate data across vehicles to check SPI 
violation rates. Also, an SPI violation does not (necessarily) mean a vehicle 
is about to crash. Rather, it means your safety case has a defect. You need to 
fix that, but it is a much more indirect safety warning compared to “you are 
about to crash” type metrics. 

It seems that everyone wants to “bootstrap” safety by doing a little testing, 
having no crashes, and then using that to argue they can operate safely even 
if the total amount of testing is inadequate for a confident safety prediction. 
The math is seductive, but the math does not answer the question that really 
matters. In practice, a typical bootstrap approach amounts to getting lucky 
rather than being safe. We give an alternative approach based on measuring 
SPI failure rates and safe failure fractions rather than bootstrapping based on 
lack of crashes. 


Chapter 9: 


Finally we get to figure out how to make a deployment decision. Beyond 
net risk (on average “safer than human’) are the issues of risk distribution 
equity, whether best practices for other aspects of safety have been 
considered including safety standards, and how uncertainty regarding 
expected safety is being handled. Moreover, software updates need to be 
done in a way that does not undermine safety or security. 

A special aspect of ensuring overall safety is being sure that road tests are 
safe. This is significantly different than deployment safety, because safe 
outcomes for public road testing are all about the human safety driver rather 
than autonomy computers. 


3 If you are getting concerned about how we can say that an incomplete safety case 
is good enough, look up “defeasible reasoning.” If that helped, then great. If not, 
don’t worry — we’ll try to do better than Wikipedia when we get to that chapter. 
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Chapter 10: 


Having discussed how to consider a deployment decision, we revisit 
themes from the book in the context of ethical concerns involved with AV 
safety, with an emphasis on the practical. The infamous Trolley Problem is 
not what you should be worrying about. Rather, the biggest issue is how 
people who have huge financial and professional incentives to deploy will 
handle a deployment safety decision — given that they are not going to be the 
ones in the vehicles when the crashes occur. 

A laundry list of other ethical concerns must be addressed for any practical 
AV system to be deployed at scale. Many of them are not talked about often, 
but they will cause practical problems if not addressed. 

Finally, we present a set of principles for ethical regulatory approaches 
that address safety, compensation, transparency, inclusion, and non- 
discrimination. Legislators in most US states are running roughshod over 
those concerns, but one can hope that will improve over time. 


Chapter 11: 


This final chapter describes other materials that might be useful, including 
free online videos we have recorded on topics relevant to this book. 


1.3. Terminology and abbreviations 


Here is a list of abbreviations and key terms used in the book for 
reference. More precise definitions for some terms are provided in the 
chapters. These are quick reference definitions to remind the reader of the 
essential part of the definition. 


Abbreviations and terms 


e Acceptable safety — A system is acceptably safe if it has a very small 
probability of substantive harm, follows best practices for safety 
engineering, and presents a risk vs. benefit tradeoff that accounts for all 
stakeholders who might suffer direct or indirect harm. 

e ADS — Automated Driving System. The computer system that drives an 
autonomous vehicle. 

e AEB - Automatic Emergency Braking. A computer-based function that 
automatically applies brakes to mitigate an impending collision. 

e AHS — Automated Highway Systems. An AV technology demo project 
from the 1990s. 
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e AI — Artificial Intelligence. This is so messy we’re not going to even 
attempt a definition.** When you hear “AI” in the context of AVs usually 
the term machine learning should have been used instead. 

e ALARP - As Low As Reasonably Practicable. A risk framework 
approach requiring reduction of risk to the degree practicable. For our 
purposes this is reasonably equivalent to ALARA (As Low As 
Reasonably Achievable) and SFAIRP (So Far As Is Reasonably 
Practicable). 

e ALKS —- Automated Lane Keeping System, UNECE #157. A standard 
for implementing a traffic jam pilot automation feature.” 

e ANSI/UL 4600 — A safety standard to ensure that an AV safety case has 
considered everything it should.” 

e ASIL — Automotive Safety Integrity Level. An automotive-specific 
variant of the concept of a SIL. 

e AUHD -— Average Unimpaired Human Driver. A potential baseline 
reference for driving safety. 

e Autonowashing — Overstating the autonomy capability of a vehicle or 
technological approach. 

e AV — Autonomous Vehicle. A vehicle operating without a requirement 
for continuous human safety supervision. 

e BRB - Big Red Button. An emergency stop button or the like to trigger 
an urgent, but hopefully safe, shutdown of an automated system. 

e DDT — Dynamic Driving Task. Normal driving, whether done by a 
person or a machine. 

e Defeasible reasoning — an approach to argument that is rationally 
compelling but potentially falsifiable due to incomplete information. 

e DMV - Department of Motor Vehicles. A state organization that licenses 
drivers and manages vehicle registrations. 

e DOT — Department of Transportation. US state and federal organizations 
that regulate transportation safety. 

e Fallback — Reacting to a vehicle failure, such as pulling to the side of the 
road. 

e  Falsified — a claim that was thought to be true has been proven false by 
observed data. Any claim in a sound safety case is believed to be true, 
but is potentially falsifiable in the face of yet-to-be-encountered 
unknowns. 

e FMVSS — Federal Motor Vehicle Safety Standard(s). US test-based 
standards for specific minimum required safety functionality. 


24 Spoiler: an AV does not “think” like a person, even if we indulge in occasional 
anthropomorphizing descriptions. 

25 See: _https://unece.org/transport/documents/2021/03/standards/un-regulation-no- 
157-automated-lane-keeping-systems-alks 

26 See: https://users.ece.cmu.edu/~koopman/ul4600/index.html 
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FN — False Negative: an object is there, but the system fails to detect it. 
FP — False Positive: there is no object, but the system detects an object. 
GAMAB - “Globalement Au Moins Aussi Bon” which describes a risk 
framework in which one thing is overall at least as good as another 
comparable type of system. (See also: PRB) 

Geofence — An ODD limitation to specified locations and/or routes. 
ODDs typically address other operational limitations beyond just 
geofencing. 

GSN — Goal Structuring Notation. A defined notation for safety cases. 
Harm -— Injury or fatality inflicted upon people. See also PDO. 

IIHS — Insurance Institute for Highway Safety. A US nonprofit funded 
by auto insurance companies. 

ISO 21448 — An automotive standard for “safety of the intended 
function” (SOTIF) that encompasses driver assistance features and AVs. 
ISO 26262 — An automotive functional safety standard that applies to 
conventional vehicles as well as AVs. 


KPI — Key Performance Indicator. A metric used to emphasize an 
important aspect of AV performance or some aspect of a company’s 
process performance that might — or might not — be relevant to safety. 
Lagging metrics — Metrics that are gathered regarding safety outcomes 
from operating the AV. 

Leading metrics — Metrics that are gathered to predict safety before loss 
events occur. 

Loss event — An AV incident involving damage to property or harm. 
Used instead of the term “accident.” A typical loss event involves a 
crash, but other types of loss events are possible. 

MEM —- Minimum Endogenous Mortality. A risk framework based on 
determining whether the risk of a system is significantly higher than the 
background exposure to other risks of everyday life. 

ML — Machine Learning. An approach to computation based on using 
training by example to set up a computationally simulated neural 
network. This is a specific technology used by most AVs that is often 
what is being referred to as “AI” (see: artificial intelligence). 

Moral Crumple Zone — The practice of assigning responsibility and 
blame for an automated system failure to some conveniently available 
person, especially if that person could not reasonably have been expected 
to prevent a loss event. See section 10.2.2. 

MRC -— Minimal Risk Condition. Stopping the vehicle after performing 
fallback. There is no actual requirement for the risk to be “minimal” in 
any sense as currently defined, but there is a requirement that it involves 
stopping the vehicle. This term is often used in regulations in a way that 
intends risk of harm while in an MRC to be acceptably low. 
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NHTSA -— National Highway Traffic Safety Administration. The US 
Department of Transportation administration responsible for vehicle 
safety and managing recalls. 

NSC — National Safety Council. A US nonprofit public service 
organization promoting health and safety. 

OD — Operational Domain. The portion of the real world that the AV 
operates in. The ODD is an approximate model of the OD. 


ODD — Operational Design Domain. The conditions under which an AV 
is intended to operate. This should be an acceptable model of the OD. 
OEDR — Object and Event Detection and Response. Detecting objects 
and other road situations, then changing own vehicle behavior in 
response. Example: steering to avoid collision with an object. 

OEM -— Original Equipment Manufacturer. A company that integrates 
and sells cars. Contrast with automotive suppliers who provide 
components to the OEM. 

PDO — Property Damage Only. A crash severity category in which no 
harm was done to people, but some objects were damaged. 
Permissiveness — How aggressively an AV can move within its ODD 
without exceeding its safety limits. 

PRA — Probabilistic Risk Assessment. Assessing risk as a sum of 
probabilities times consequences. 


PRB — Positive Risk Balance. An AV should be no worse than a human 
driver. 

RSS — Responsibility Sensitive Safety. A strategy for attaining provably 
blame-free AV behavior based on a Newtonian physics approach. 

SAE — The organization formerly known as the Society of Automotive 
Engineers. Now SAE is just short for “SAE International.” 


SAE J3016 — A terminology standard for automated vehicles that is 
commonly mistaken for (but is most definitely NOT) a safety standard. 
SAE J3018 — A standard covering human safety driver aspects of road 
testing safety. 

SAE Levels — A six-level categorization (Levels 0 to 5) designating the 
functionality assigned to automation equipment in a vehicle. The Levels 
are defined in SAE J3016. 

Safety Case — A structured argument, supported by evidence, that 
supports a claim that an AV is acceptably safe to deploy. 

SIL — Safety Integrity Level. Used to determine the engineering rigor to 
be applied to achieve acceptable risk mitigation for a safety-critical 
system or feature. 

SMS — Safety Management System. A system of metrics used to monitor 
for, identify, and correct safety issues. 

SOTIF — Safety Of The Intended Function. Associated with a 
methodology for identifying safety-related performance and requirement 
insufficiencies for driver assistance and AV technology. See ISO 21448. 
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SPI — Safety Performance Indicator (pronounced S-P-I rather than 
“spy”). A metric tied to a claim in a safety case and associated with a 
threshold beyond which the claim has been falsified. An SPI violation 
occurs when the SPI’s threshold has been exceeded by the metric value. 
TN — True Negative: there is no object and the system detects no object. 
TP — True Positive: an object is there and is recognized. 

TTC — Time To Collision. A risk metric for how long it would be until a 
collision if vehicles were not maneuvered to avoid that collision. 

VMT — Vehicle Miles Traveled, often in millions (e.g., 100M VMT is 
100 million miles in total traveled by a set of vehicles). 

VSSA — Voluntary Safety Self-Assessment. A report to NHTSA 
submitted by some AV companies disclosing some information relevant 
to plans for safety. 


Numerical conventions: 


“K” — kilo/thousand (1,000), e.g., 100K is 100,000 
kph — kilometers per hour 

“M” — million (1,000,000), e.g., 80M is 80,000,000 
mph — miles per hour 


352 PREVIEW Koopman 


Sadly, it is common for Web resources to go stale. If one of the cited 
references becomes unavailable, try accessing via entering the URL into the 
archive server here: https://archive.org/ 
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